Detecting Multiword Expressions by Dependency Parsing

نویسندگان

  • István Nagy
  • Veronika Vincze
چکیده

In this poster, we present how different types of MWEs can be identified by dependency parsers in different languages. In our investigations, we focus on English verb-particle constructions (VPCs), Hungarian light verb constructions (LVCs) and German light verb constructions. In our experiments, we exploit the fact that some treebanks contain MWE-aware annotations, i.e. there are MWE-specific morphological or syntactic tags in them. For instance, the French Treebank contains explicit annotations for MWEs (Abeillé et al. 2003) and different version of the Turkish Treebank are also annotated for MWEs (Eryiğit et al. 2011). Here, we make use of the Penn Treebank (Marcus et al., 1993), which contains annotation for VPCs, the TIGER corpus (Brants et al. 2004) and the Szeged Dependency Treebank (Vincze et al. 2013), both of which contain annotation for LVCs. In these treebanks, the special relation of the two components of the MWE is distinctively marked by a certain syntactic label. This entails that if a data-driven syntactic parser is trained on a dataset annotated with extra information for MWEs, it will be able to assign such tags as well, in other words, the syntactic parser itself will be able to identify MWEs in texts. In our experiments, we investigate the performance of such dependency parsers for three languages and two different MWE types.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiword Expressions As Dependency Subgraphs

We propose to model multiword expressions as dependency subgraphs, and realize this idea in the grammar formalism of Extensible Dependency Grammar (XDG). We extend XDG to lexicalize dependency subgraphs, and show how to compile them into simple lexical entries, amenable to parsing and generation with the existing XDG constraint solver.

متن کامل

USzeged: Identifying Verbal Multiword Expressions with POS Tagging and Parsing Techniques

The paper describes our system submitted for the Workshop on PARSEME’s Shared Task on automatic identification of verbal multiword expressions . It uses POS tagging and dependency parsing to identify singleand multi-token verbal MWEs in text. Our system is language-independent and competed on nine of the eighteen languages. Our paper describes how our system works and gives its error analysis f...

متن کامل

Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show th...

متن کامل

Joint Dependency Parsing and Multiword Expression Tokenization

Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multiword expressions identification, in which complex function words are represented as individual tokens linked with morphological dependencies. Our graphbased parser includes standard secondo...

متن کامل

English Multiword Expression-aware Dependency Parsing Including Named Entities

Because syntactic structures and spans of multiword expressions (MWEs) are independently annotated in many English syntactic corpora, they are generally inconsistent with respect to one another, which is harmful to the implementation of an aggregate system. In this work, we construct a corpus that ensures consistency between dependency structures and MWEs, including named entities. Further, we ...

متن کامل

Multiword Expressions in Statistical Dependency Parsing

In this paper, we investigated the impact of extracting different types of multiword expressions (MWEs) in improving the accuracy of a data-driven dependency parser for a morphologically rich language (Turkish). We showed that in the training stage, the unification of MWEs of a certain type, namely compound verb and noun formations, has a negative effect on parsing accuracy by increasing the le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014